LIDS REPORT 2876 1 Weighted Bellman Equations and their Applications in Approximate Dynamic Programming ∗

نویسندگان

  • Huizhen Yu
  • Dimitri P. Bertsekas
چکیده

We consider approximation methods for Markov decision processes in the learning and simulation context. For policy evaluation based on solving approximate versions of a Bellman equation, we propose the use of weighted Bellman mappings. Such mappings comprise weighted sums of one-step and multistep Bellman mappings, where the weights depend on both the step and the state. For projected versions of the associated Bellman equations, we show that their solutions have the same nature and essential approximation properties as the commonly used approximate solutions from TD(λ). The most important feature of our framework is that each state can be associated with a different type of mapping. Compared with the standard TD(λ) framework, this gives a more flexible way to combine multistage costs and state transition probabilities in approximate policy evaluation, and provides alternative means for bias-variance control. With weighted Bellman mappings, there is also greater flexibility to design learning and simulation-based algorithms. We demonstrate this with examples, including new TD-type algorithms with state-dependent λ parameters, as well as block versions of the algorithms. Weighted Bellman mappings can also be applied in approximate policy iteration: we provide several examples, including some new optimistic policy iteration schemes. Another major feature of our framework is that the projection need not be based on a norm, but rather can use a semi-norm. This allows us to establish a close connection between projected equation and aggregation methods, and to develop for the first time multistep aggregation methods, including some of the TD(λ)-type. Oct 2012 ∗Work supported by the Air Force Grant FA9550-10-1-0412. †Lab. for Information and Decision Systems, M.I.T. janey [email protected] ‡Lab. for Information and Decision Systems, M.I.T. [email protected]

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

بهینه یابی مسیر تولید نفت ایران: یک مدل کنترل بهینه برنامه ریزی پویا

In this article we present a dynamic programming model for oil production in Iran. To this end, we represent our model in the form of a Bellman equation in which the function for discounted profit as the objective function is formulated as a Bellman equation, and hence is viewed as a dynamic programming problem.Specifically, in order to show the liquids flow in oil tanks, we use differential eq...

متن کامل

A Comparison of Parametric Approximation Techniques to Continuous-Time Stochastic Dynamic Programming Problems

The views and interpretations expressed in these Reports are those of the author(s) and should not be attributed to any organisation associated with the EERH. Abstract 4 I. Introduction 5 II. A generalized stochastic optimal control problem in continuous-time setting 7 III. Parametric approximation approaches to HJB equations 8 IV. Case study 1: Unidimensional standard fishery problem 12 V. Cas...

متن کامل

Approximate Dynamic Programming for Ship Course Control

Dynamic programming (DP) is a useful tool for solving many control problems, but for its complexity in computation, traditional DP control algorithms are not satisfactory in fact. So we must look for a new method which not only has the advantages of DP but also is easier in computation. In this paper, approximate dynamic programming (ADP) based controller system has been used to solve a ship he...

متن کامل

An Overview of Research on Adaptive Dynamic Programming

Adaptive dynamic programming (ADP) is a novel approximate optimal control scheme, which has recently become a hot topic in the field of optimal control. As a standard approach in the field of ADP, a function approximation structure is used to approximate the solution of Hamilton-Jacobi-Bellman (HJB) equation. The approximate optimal control policy is obtained by using the offline iteration algo...

متن کامل

Uniqueness Results for Second-Order Bellman--Isaacs Equations under Quadratic Growth Assumptions and Applications

In this paper, we prove a comparison result between semicontinuous viscosity sub and supersolutions growing at most quadratically of second-order degenerate parabolic Hamilton-Jacobi-Bellman and Isaacs equations. As an application, we characterize the value function of a finite horizon stochastic control problem with unbounded controls as the unique viscosity solution of the corresponding dynam...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2012